Extreme Extraction: Only One Hour per Relation
نویسندگان
چکیده
Information Extraction (IE) aims to automatically generate a large knowledge base from natural language text, but progress remains slow. Supervised learning requires copious human annotation, while unsupervised and weakly supervised approaches do not deliver competitive accuracy. As a result, most fielded applications of IE, as well as the leading TAC-KBP systems, rely on significant amounts of manual engineering. Even “Extreme” methods, such as those reported in Freedman et al. [11], require about 10 hours of expert labor per relation. This paper shows how to reduce that effort by an order of magnitude. We present a novel system, INSTAREAD, that streamlines authoring with an ensemble of methods: 1) encoding extraction rules in an expressive and compositional representation, 2) guiding the user to promising rules based on corpus statistics and mined resources, and 3) introducing a new interactive development cycle that provides immediate feedback — even on large datasets. Experiments show that experts can create quality extractors in under an hour and even NLP novices can author good extractors. These extractors equal or outperform ones obtained by comparably supervised and state-of-the-art distantly supervised approaches.
منابع مشابه
Iditarod Sled Dog Race
OnMarch 3rd, sixty-six teams from around the world will line up in Anchorage, Alaska to kick off the 40th Annual Iditarod Sled Dog Race. Each team is powered by 16 ‘‘Alaskan sled dogs’’ and one human ‘‘musher,’’ who leads the pack across 1,100 miles of Alaskan wilderness to the Bering Sea town of Nome. The race takes only 8–10 days to complete, with the dogs running 100 miles a day at speeds be...
متن کاملOn the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach
BACKGROUND Most previous Protein Protein Interaction (PPI) studies evaluated their algorithms' performance based on "per-instance" precision and recall, in which the instances of an interaction relation were evaluated independently. However, we argue that this standard evaluation method should be revisited. In a large corpus, the same relation can be described in various different forms and, in...
متن کاملA Vegetable Extract Used as an Antigen for the Kahn Test: An Experimental Trial.
It has been shown that a substance with properties akin to those of syphilis " antigen " can be extracted from commercial soya bean flour (Stevenson, 1949). The most satisfactory method is the following: 25 g. commercial soya flour is extracted successively with 100, 75, 75, 75 ml. pure ether. The flour is then dried and weighed and extracted for one hour at 800 C. with 95 per cent. ethyl alcoh...
متن کاملClinical factors associated with extreme sleep apnoea [AHI>100 events per hour] in Peruvian patients: A case-control study–A preliminary report
PURPOSE The severity of obstructive sleep apnoea (OSA) ranges from mild or moderate to severe sleep apnoea. However, there is no information available on the clinical characteristics associated with cases involving more than 100 events per hour. This is a preliminary report and our goal was to characterise the demographics and sleep characteristics of patients with Extreme OSA and compare with ...
متن کاملCommute Time and Subjective Well-Being in Urban China
Using data from the 2010 China Family Panel Studies, this study investigates the association between commute time and subjective well-being in a sample of 16to 65-year-old employees in urban China. We find evidence that a longer commute time is associated with lower levels of both life satisfaction and happiness, especially when the commute times are extreme (≥ 1 hour per day). A multiple media...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1506.06418 شماره
صفحات -
تاریخ انتشار 2015